Fast gap-free enumeration of conformations and sequences for protein design.
نویسندگان
چکیده
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
منابع مشابه
Statistical mechanics of protein folding by exhaustive enumeration.
It is hard to construct theories for the folding of globular proteins because they are large and complicated molecules having enormous numbers of nonnative conformations and having native states that are complicated to describe. Statistical mechanical theories of protein folding are constructed around major simplifying assumptions about the energy as a function of conformation and/or simplifica...
متن کاملBWM*: A Novel, Provable, Ensemble-Based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks bett...
متن کاملExact Statistical Mechanical Investigation of a Finite Model Protein in its environment: A Small System Paradigm
We consider a general incompressible finite model protein of size M in its environment, which we represent by a semiflexible copolymer consisting of amino acid residues classified into only two species (H and P, see text) following Lau and Dill. We allowing various interactions between chemically unbonded residues in a given sequence χ and the solvent (water), and exactly enumerate the number o...
متن کاملRapid purification of HU protein from Halobacillus karajensis
The histone-like protein HU is the most-abundant DNA-binding protein in bacteria. The HU protein non-specifically binds and bends DNA as a hetero- or homodimer, and can participate in DNA supercoiling and DNA condensation. It also takes part in DNA functions such as replication, recombination, and repair. HU does not recognize any specific sequences but shows a certain degree of specificity to ...
متن کاملAnalysis and Professional Designing of COBRA (Computationally Optimized Broadly Reactive Antigen) Vaccine for Bm86 midgut Protein of R. microplus and R. annulatus Ticks
Introduction: The cattle tick Rhipicephalus spp. causes significant economic losses due to diseases in animals and human. Bm86 is a midgut protein and vaccine candidate, which its sequences among the isolates of Ripsephalus spp are geographically separated, variable, and are the main reason for reducing effectiveness, and subsequently, the failure of the recombinant vaccines. Method: In this bi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proteins
دوره 83 10 شماره
صفحات -
تاریخ انتشار 2015